fix(console): fix intermittent login failures in dut_console tests by lipxu · Pull Request #23342 · sonic-net/sonic-mgmt

lipxu · 2026-03-26T07:02:44Z

Fix two intermittent failures in dut_console tests: an accumulation-buffer fix for the Password prompt detection race, and a splitlines()[0] fix for reliable TMOUT value extraction in test_idle_timeout.

Description of PR

Summary:

Fix two independent intermittent failures in dut_console tests:

ssh_console_conn.py — login_stage_2() checked re.search(pwd_pattern, output) where output is only the most recent read_channel() chunk. The DUT's Password: prompt can arrive split across multiple TCP reads (e.g. Pa + ssword:), causing no chunk to match and the password never being sent — resulting in an intermittent "Socket is closed" failure in create_duthost_console (~1 in 5 runs). Fix: check return_msg (accumulated read buffer) instead of output.
test_idle_timeout.py — splitlines()[-1] could return a partial prompt string (e.g. admin@hostname:) instead of the numeric TMOUT value when the prompt was not fully stripped from the command output. Fix: use splitlines()[0] to always read the first output line, which is always the numeric value.

Both fixes were validated on internal branch dev/xuliping/20260325_202511_console_login_fix across 5 full test runs with no failures.

Related: follows up on #23295 (blank Enter fix in the same login path — already merged).

Type of change

Back port request

Approach

What is the motivation for this PR?

dut_console tests were failing intermittently (~1 in 5 runs) with "Socket is closed" errors during console login. Root cause: the DUT's Password: prompt sometimes arrives split across multiple TCP reads, so per-chunk pattern matching never matches. A second independent failure in test_idle_timeout caused by splitlines()[-1] returning a prompt fragment instead of the numeric TMOUT value.

How did you do it?

ssh_console_conn.py: Changed re.search(pwd_pattern, output) to re.search(pwd_pattern, return_msg) in login_stage_2(), where return_msg is the accumulated read buffer across all chunks.
test_idle_timeout.py: Changed splitlines()[-1] to splitlines()[0] in the TMOUT value extraction, so we always get the first output line regardless of trailing prompt remnants.

How did you verify/test it?

Ran all dut_console test cases on a physical testbed using internal branch dev/xuliping/20260325_202511_console_login_fix for 5 full iterations. All tests passed with no failures.

Any platform specific information?

None — applies to all platforms using SSH console connections.

Supported testbed topology if it's a new test case?

N/A (bug fix only)

Documentation

N/A

…_timeout When find_prompt() captures a partial prompt, netmiko strip_prompt() may fail to remove the trailing prompt, causing splitlines()[-1] to return the prompt string instead of the numeric value. Using splitlines()[0] always picks the first meaningful output line, which is the actual numeric value. Signed-off-by: Liping Xu <xuliping@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The Password: prompt from the DUT is sometimes split across multiple TCP reads (e.g., 'Pa' + 'ssword:'), causing re.search(pwd_pattern, output) to fail on each individual chunk. By checking return_msg (the accumulated read buffer) instead of output, we correctly detect the Password: prompt even when it arrives in fragments. This fixes the intermittent 'Socket is closed' failure in create_duthost_console where 1 in ~5 runs would fail because the password was never sent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Liping Xu <xuliping@microsoft.com>

mssonicbld · 2026-03-26T07:02:52Z

/azp run

azure-pipelines · 2026-03-26T07:03:06Z

Azure Pipelines successfully started running 1 pipeline(s).

Add missing newline at end of file to fix the pre-commit fix-end-of-files check failure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Liping Xu <xuliping@microsoft.com>

mssonicbld · 2026-03-26T10:24:29Z

/azp run

azure-pipelines · 2026-03-26T10:24:43Z

Azure Pipelines successfully started running 1 pipeline(s).

lolyu

✅ Approved — Clean, Well-Targeted Bug Fix

Two surgical fixes for intermittent dut_console test failures:

ssh_console_conn.py: Using accumulated return_msg instead of per-chunk output for password prompt detection — correctly handles TCP fragmentation splitting Password: across reads
test_idle_timeout.py: splitlines()[0] instead of splitlines()[-1] for reliable TMOUT extraction — avoids prompt fragments

Both fixes are minimal, well-documented, and validated across 5 full test iterations. LGTM 🚀

lolyu · 2026-03-27T00:53:30Z

tests/common/connections/ssh_console_conn.py

                # Search for password pattern / send password
-                if user_sent and not password_sent and re.search(pwd_pattern, output, flags=re.I):
+                # Use return_msg (accumulated) instead of output to handle cases where
+                # 'Password:' prompt is split across multiple TCP reads (e.g. 'Pa' + 'ssword:')


✅ Good fix — TCP fragmentation splitting Password: across reads is a classic race. Using the accumulated return_msg instead of the per-chunk output is the right approach. Well-commented too.

lolyu · 2026-03-27T00:53:37Z

tests/dut_console/test_idle_timeout.py

    duthost = duthosts[enum_rand_one_per_hwsku_hostname]
    logger.info("Get default session idle timeout")
-    default_tmout = duthost_console.send_command('echo $TMOUT')
+    default_tmout = duthost_console.send_command('echo $TMOUT').strip().splitlines()[0].strip()


💡 Nit: .strip().splitlines()[0].strip() could raise IndexError if send_command returns an empty string (unlikely but possible on connection issues). A defensive guard would be:

lines = duthost_console.send_command('echo $TMOUT').strip().splitlines() default_tmout = lines[0].strip() if lines else ""

Not a blocker — the old code would also fail on empty output.

…onic-net#23342) What is the motivation for this PR? dut_console tests were failing intermittently (~1 in 5 runs) with "Socket is closed" errors during console login. Root cause: the DUT's Password: prompt sometimes arrives split across multiple TCP reads, so per-chunk pattern matching never matches. A second independent failure in test_idle_timeout caused by splitlines()[-1] returning a prompt fragment instead of the numeric TMOUT value. How did you do it? ssh_console_conn.py: Changed re.search(pwd_pattern, output) to re.search(pwd_pattern, return_msg) in login_stage_2(), where return_msg is the accumulated read buffer across all chunks. test_idle_timeout.py: Changed splitlines()[-1] to splitlines()[0] in the TMOUT value extraction, so we always get the first output line regardless of trailing prompt remnants. How did you verify/test it? Ran all dut_console test cases on a physical testbed using internal branch dev/xuliping/20260325_202511_console_login_fix for 5 full iterations. All tests passed with no failures. Any platform specific information? None — applies to all platforms using SSH console connections. Supported testbed topology if it's a new test case? N/A (bug fix only)

…onic-net#23342) What is the motivation for this PR? dut_console tests were failing intermittently (~1 in 5 runs) with "Socket is closed" errors during console login. Root cause: the DUT's Password: prompt sometimes arrives split across multiple TCP reads, so per-chunk pattern matching never matches. A second independent failure in test_idle_timeout caused by splitlines()[-1] returning a prompt fragment instead of the numeric TMOUT value. How did you do it? ssh_console_conn.py: Changed re.search(pwd_pattern, output) to re.search(pwd_pattern, return_msg) in login_stage_2(), where return_msg is the accumulated read buffer across all chunks. test_idle_timeout.py: Changed splitlines()[-1] to splitlines()[0] in the TMOUT value extraction, so we always get the first output line regardless of trailing prompt remnants. How did you verify/test it? Ran all dut_console test cases on a physical testbed using internal branch dev/xuliping/20260325_202511_console_login_fix for 5 full iterations. All tests passed with no failures. Any platform specific information? None — applies to all platforms using SSH console connections. Supported testbed topology if it's a new test case? N/A (bug fix only) Signed-off-by: mssonicbld <sonicbld@microsoft.com>

mssonicbld · 2026-03-28T17:24:19Z

Cherry-pick PR to 202511: #23396

…23342) (#23396) What is the motivation for this PR? dut_console tests were failing intermittently (~1 in 5 runs) with "Socket is closed" errors during console login. Root cause: the DUT's Password: prompt sometimes arrives split across multiple TCP reads, so per-chunk pattern matching never matches. A second independent failure in test_idle_timeout caused by splitlines()[-1] returning a prompt fragment instead of the numeric TMOUT value. How did you do it? ssh_console_conn.py: Changed re.search(pwd_pattern, output) to re.search(pwd_pattern, return_msg) in login_stage_2(), where return_msg is the accumulated read buffer across all chunks. test_idle_timeout.py: Changed splitlines()[-1] to splitlines()[0] in the TMOUT value extraction, so we always get the first output line regardless of trailing prompt remnants. How did you verify/test it? Ran all dut_console test cases on a physical testbed using internal branch dev/xuliping/20260325_202511_console_login_fix for 5 full iterations. All tests passed with no failures. Any platform specific information? None — applies to all platforms using SSH console connections. Supported testbed topology if it's a new test case? N/A (bug fix only) Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Liping Xu <108326363+lipxu@users.noreply.github.com>

…onic-net#23342) (sonic-net#23396) What is the motivation for this PR? dut_console tests were failing intermittently (~1 in 5 runs) with "Socket is closed" errors during console login. Root cause: the DUT's Password: prompt sometimes arrives split across multiple TCP reads, so per-chunk pattern matching never matches. A second independent failure in test_idle_timeout caused by splitlines()[-1] returning a prompt fragment instead of the numeric TMOUT value. How did you do it? ssh_console_conn.py: Changed re.search(pwd_pattern, output) to re.search(pwd_pattern, return_msg) in login_stage_2(), where return_msg is the accumulated read buffer across all chunks. test_idle_timeout.py: Changed splitlines()[-1] to splitlines()[0] in the TMOUT value extraction, so we always get the first output line regardless of trailing prompt remnants. How did you verify/test it? Ran all dut_console test cases on a physical testbed using internal branch dev/xuliping/20260325_202511_console_login_fix for 5 full iterations. All tests passed with no failures. Any platform specific information? None — applies to all platforms using SSH console connections. Supported testbed topology if it's a new test case? N/A (bug fix only) Signed-off-by: mssonicbld <sonicbld@microsoft.com> Co-authored-by: Liping Xu <108326363+lipxu@users.noreply.github.com>

…onic-net#23342) What is the motivation for this PR? dut_console tests were failing intermittently (~1 in 5 runs) with "Socket is closed" errors during console login. Root cause: the DUT's Password: prompt sometimes arrives split across multiple TCP reads, so per-chunk pattern matching never matches. A second independent failure in test_idle_timeout caused by splitlines()[-1] returning a prompt fragment instead of the numeric TMOUT value. How did you do it? ssh_console_conn.py: Changed re.search(pwd_pattern, output) to re.search(pwd_pattern, return_msg) in login_stage_2(), where return_msg is the accumulated read buffer across all chunks. test_idle_timeout.py: Changed splitlines()[-1] to splitlines()[0] in the TMOUT value extraction, so we always get the first output line regardless of trailing prompt remnants. How did you verify/test it? Ran all dut_console test cases on a physical testbed using internal branch dev/xuliping/20260325_202511_console_login_fix for 5 full iterations. All tests passed with no failures. Any platform specific information? None — applies to all platforms using SSH console connections. Supported testbed topology if it's a new test case? N/A (bug fix only) Signed-off-by: selldinesh <dinesh.sellappan@keysight.com>

lipxu and others added 2 commits March 26, 2026 07:01

github-actions bot requested review from YatishSVC, bingwang-ms and yanmo96 March 26, 2026 07:03

[dut_console]: Fix missing trailing newline in test_idle_timeout.py

37f17ab

Add missing newline at end of file to fix the pre-commit fix-end-of-files check failure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Liping Xu <xuliping@microsoft.com>

lolyu approved these changes Mar 27, 2026

View reviewed changes

lolyu merged commit fbc5443 into sonic-net:master Mar 27, 2026
18 of 19 checks passed

lolyu added the Request for 202511 branch Request to backport a change to 202511 branch label Mar 27, 2026

vmittal-msft added the Approved for 202511 branch label Mar 28, 2026

mssonicbld added the Created PR to 202511 branch label Mar 28, 2026

mssonicbld mentioned this pull request Mar 28, 2026

[action] [PR:23342] fix(console): fix intermittent login failures in dut_console tests #23396

Merged

12 tasks

mssonicbld added Included in 202511 branch and removed Created PR to 202511 branch labels Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(console): fix intermittent login failures in dut_console tests#23342

fix(console): fix intermittent login failures in dut_console tests#23342
lolyu merged 3 commits intosonic-net:masterfrom
lipxu:fix/dut-console-login-fixes

lipxu commented Mar 26, 2026 •

edited

Loading

Uh oh!

mssonicbld commented Mar 26, 2026

Uh oh!

azure-pipelines bot commented Mar 26, 2026

Uh oh!

mssonicbld commented Mar 26, 2026

Uh oh!

azure-pipelines bot commented Mar 26, 2026

Uh oh!

lolyu left a comment

Uh oh!

lolyu Mar 27, 2026

Uh oh!

lolyu Mar 27, 2026

Uh oh!

Uh oh!

mssonicbld commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lipxu commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Uh oh!

mssonicbld commented Mar 26, 2026

Uh oh!

azure-pipelines bot commented Mar 26, 2026

Uh oh!

mssonicbld commented Mar 26, 2026

Uh oh!

azure-pipelines bot commented Mar 26, 2026

Uh oh!

lolyu left a comment

Choose a reason for hiding this comment

✅ Approved — Clean, Well-Targeted Bug Fix

Uh oh!

lolyu Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

lolyu Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mssonicbld commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lipxu commented Mar 26, 2026 •

edited

Loading